REPORT  DOCUMENTATION  PAGE 


AFRL-SR-AR-TR-02- 


Public  Reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  ^  . . v<%>uva»g,  UUaci  OVUlVCd, 

gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comment  regarding  this  burden  estimates  or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  information  Operations  and  Reports,  1215  Jefferson  Davis 
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Funds  were  received  in  April  2001  under  the  Department  of  Defense  DURIP  program  for  construction  of  a  48  processor  high 
performance  computing  cluster.  This  report  details  the  hardware  which  was  purchased  and  how  It  has  been  used  to  enable  and 
enhance  research  activities  directly  supported  by,  and  of  Interest  to,  the  Air  Force  Office  of  Scientific  Research  and  the  Department 
of  Defense.  The  report  Is  divided  into  two  major  sections.  The  first  section  after  this  summary  describes  the  computer  cluster,  its 
setup,  and  some  cluster  performance  benchmark  results.  The  second  section  explains  ongoing  research  efforts  which  have 
benefited  from  the  cluster  hardware,  and  presents  highlights  of  those  efforts  since  Installation  of  the  cluster. 
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1  Summary 

Funds  were  received  in  April  2001  under  the  Department  of  Defense  DURIP  program  for  con¬ 
struction  of  a  48  processor  high  performance  computing  cluster.  This  report  details  the  hardware 
which  was  purchased  and  how  it  has  been  used  to  enable  and  enhance  research  activities  directly 
supported  by,  and  of  interest  to,  the  Air  Force  Office  of  Scientific  Research  and  the  Department 
of  Defense.  The  report  is  divided  into  two  major  sections.  The  first  section  after  this  summary 
describes  the  computer  cluster,  its  setup,  and  some  cluster  performance  benchmark  results.  The 
second  section  explains  ongoing  research  efforts  which  have  benefited  from  the  cluster  hardware, 
and  presents  highlights  of  those  efforts  since  installation  of  the  cluster. 

2  Cluster  Construction  and  Benchmarking 

The  original  proposal  requested  funds  for  24  dual-processor  computing  nodes  along  with  a  high 
bandwidth  interconnect  and  other  supporting  hardware.  The  author  was  able  to  combine  the 
granted  DURIP  funds  with  funds  from  an  independent  research  grant  awarded  to  investigators 
in  the  Chemical  Engineering  and  Mechanical  Engineering  departments  at  Stanford  University,  and 
build  a  larger  machine  which  met  the  needs  of  both  groups.  This  combination  of  funding  allowed 
for  price/performance  ratio  benefits  through  economy  of  scale  and  bulk  order  pricing.  The  result 
was  “whitehot,”  a  rack  mounted,  112  processor,  high  performance  computing  cluster,  shown  in 
Figure  1.  The  DURIP  project  group  maintains  control  and  exclusive  use  of  48  of  the  available 
processors,  with  the  balance  of  resources  allocated  to  the  collaborating  group. 
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Figure  1:  Photograph  of  the  56  node  rack  mounted  cluster,  named  “whitehot.” 
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Component 

Penguin  Computing  Niveus  Full  Tower  Server 

Penguin  Computing  Relion  1 10  Dual  Pentuin  III  1  Ghz  Compute  Node 

Dolphin  Wulfkit  64bit/66MHz  PCI  adaptor  card 

Serial,  Ethernet,  and  Dolphin  Cable  Kit 

APC  Vertical  Mount  Power  Distribution  Unit 

Cyclades  32  Port  Terminal  Server 

42U  Enclosure  with  Ceiling  Fans 

Intel  Express  530T  72  Port  Fast  Ethernet  Switch 


Table  1 :  Hardware  list  for  the  purchased  cluster  configuration. 


Figure  2:  DNS  benchmark  problem  results  for  one  process  per  node  (1  ppn)  and  two  processes  per 
node  (2  ppn). 

Several  hardware  vendors  were  considered  before  the  final  decision  was  made  to  purchase  from 
Penguin  Computing  in  late  May  2001.  The  hardware  for  the  entire  purchased  cluster  configuration 
is  shown  in  Table  1.  Each  compute  node  contains  two  1  GHz  Pentium  HI  processors,  1  Gbyte 
of  PC  133  RAM,  and  a  30  GB  EIDE  hard  disk,  giving  a  cluster  theoretical  peak  performance 
of  112  gigaflops.  Intemode  communication  is  handled  by  Dolphin  Wulfkit  64  bit/66  MHz  PCI 
cards  installed  in  each  of  the  nodes.  The  Dolphin  PCI  cards  are  connected  via  special  cables  in  a 
two  dimensional  torus  topology.  The  Dolphin  Wulfkit  system  is  capable  of  providing  interprocess 
bandwidth  in  excess  of  150  Mbyte/sec  for  point-to-point  communication  using  the  message  passing 
interface  (MPI)  software  layer.  The  master  node  is  a  dual  processor  Pentium  HI  machine  which 
contains  a  total  of  290  Gbytes  of  local  hard  disk  space.  A  local  fast  ethemet  network  connected 
by  an  Intel  Express  530T  rack-mounted  switch  provides  an  additional  communication  layer  for 
system  maintenance,  system  monitoring,  and  user  access.  System  hardware  is  covered  by  two 
years  of  replacement  coverage  and  onsite  support. 

All  hardware  was  installed  and  originally  configured  by  Penguin  Computing.  Penguin  also 
installed  the  Linux  operating  system,  Scali  cluster  management  software,  the  Scali  MPI  imple¬ 
mentation,  and  the  OpenPBS  queue  management  system.  Software  added  after  the  initial  setup 
included  the  Portland  Group  FORTRAN  77  and  Fortran  90  compilers  as  well  as  the  BLAS,  LA- 
PACK,  and  FFTW  math  libraries.  The  hardware  setup  was  completed  by  27  July  2001,  with  soft¬ 
ware  installation,  cluster  testing  and  benchmarking  taking  place  during  the  month  of  August  2001. 
Figure  2  shows  the  results  from  a  benchmark  test  consisting  of  a  number  of  calls  to  a  routine  which 
constructs  a  numerical  approximation  to  the  spatial  terms  appearing  in  the  compressible  Navier- 
Stokes  equations.  The  computations  performed  in  this  benchmark  are  typical  of  the  calculations 
required  during  a  high  fidelity  direct  numerical  simulation  of  an  unsteady  fluid  flow.  Speedup  for 
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this  benchmark  was  nearly  linear  for  one  process  per  node  calculations.  When  two  processes  per 
node  where  used,  performance  per  processor  dropped  by  about  30-35%  due  to  PCI  and  memory 
bus  sharing  between  processes.  However,  this  performance  penalty  is  offset  by  the  network  and 
enclosure  cost  savings  inherent  in  a  dual  processor  cluster  design. 


3  Cluster  Applications 

High  Speed  T\irbiilent  Jets 

The  suppression  of  noise  produced  by  the  exhaust  gases  from  high  thrust  jet  engines  continues  to  be 
an  important  issue  for  civil  and  military  applications.  The  Federal  Aviation  Administration  has  set 
stringent  ‘Stage  3’  noise  requirements  on  new  commercial  aircraft  entering  service  and  is  planning 
for  the  even  more  stringent  ‘Stage  4’  noise  requirements.  To  aid  in  the  new  technology  needed  for 
this  noise  suppression  NASA  has  begun  aircraft  noise  research  programs  to  reduce  aircraft  noise 
by  10  dB  by  2007  and  by  20  dB  by  2022.  In  military  applications,  shock-associated  jet  noise  has 
been  found  as  a  source  of  structural  fatigue  in  the  aft  sections  of  fighter  aircraft.  One  particular 
example  is  the  damage  to  the  tail  sections  of  F-15s  caused  by  jet  screech,  a  discrete  component  of 
shock-associated  jet  noise. 

The  hardware  provided  through  the  DURIP  funds  directly  supports  research  to  understand  the 
physics  of  the  noise  produced  by  sub-  and  supersonic  jets.  One  such  project  is  the  numerical 
simulation  of  a  Mach  0.9  turbulent  jet  by  large  eddy  simulation  (LES),  whose  goal  is  to  define  the 
state-of-the-art  in  LES  predictions  of  jet  noise.  By  determining  the  capabilities  and  limits  of  LES 
in  noise  predictions,  a  design  and  engineering  tool  is  made  available  that,  to  date,  has  not  existed 
in  industry.  The  high  fidelity  predictions  are  able  to  be  completed  in  a  reasonable  amount  of  time 
(relative  to  the  typical  design  cycle)  such  that  design-relevant  trends  are  able  to  be  determined. 

A  typical  example  is  shown  in  Figure  3  where  the  sound  from  a  turbulent  jet  is  clearly  shown. 
Using  the  information  available  in  a  numerical  simulation,  the  noise  from  the  jet  can  be  extrapo¬ 
lated  to  an  observer  far  away  from  the  jet  in  a  manner  that  more  realistically  represents  the  actual 
design  problem.  These  results  were  obtained  from  a  numerical  calculation  of  approximately  three 
weeks  in  length  using  32  processors  and  the  Message  Passing  Interface  (MPI).  More  detailed  in¬ 
vestigation  is  desired  in  an  effort  to  locate  the  physical  mechanisms  of  noise  generation  and  relate 
them  to  possible  design  changes  but  is  currently  prohibited  by  the  limited  amount  of  storage  space 
available  for  the  time  dependent  three  dimensional  flow  fields  needed. 

Shock-associated  noise  predietion  work  is  also  being  facilitated  by  the  DURIP-provided  com¬ 
puter  cluster.  In  this  work,  the  ability  of  LES  to  prediet  the  broadband  shock-associated  noise  is 
being  tested  by  comparison  with  available  direct  numerical  simulation  (DNS)  data.  Results  from 
the  research,  of  which  a  portion  were  reported  in  Bodony  and  Lele  (2002),  are  being  used  to  im¬ 
prove  the  quality  of  noise  predictions  through  the  development  of  new  subgrid  scale  models  in  the 
LES  formulation.  Improved  models  developed  in  this  work  will  be  re-applied  to  the  aforemen¬ 
tioned  turbulent  jet  noise  problem. 
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Figure  3:  Howfield  snapshot  from  LES  of  a  turbulent  jet,  showing  turbulent  vorticity  fluctuations 
in  the  jet  (colors)  creating  acoustic  waves  which  radiate  to  the  farfield  (greyscale). 

Large  Eddy  Simulation  of  'Dirbine  Blade  Heat  IVansfer 

Performance  of  gas  turbine  engines  is  limited  in  a  fundamental  way  by  the  maximum  allowable 
turbine  inlet  temperature,  which  is  related  to  the  efficiency  of  individual  turbine  blade  heat  transfer 
characteristics.  The  objective  of  this  research  is  to  develop  large  eddy  simulation  (LES)  as  a  tool 
for  heat  transfer  prediction  over  a  turbine  blade  immersed  in  a  hot  stream  containing  free-stream 
turbulence  (FST). 

Recent  progress  in  algorithm  development  has  been  made  in  applying  LES  to  this  problem,  par¬ 
ticularly  in  the  areas  of  specification  of  realistic  free-stream  turbulence  in  the  computational  inflow 
region,  and  in  development  of  efficient  numerical  schemes  for  LES  at  the  desired  flow  conditions. 
One  set  of  computations  completed  to  date  explores  heat  transfer  characteristics  in  the  leading 
edge  region  of  a  turbine  blade  with  oncoming  FST.  Figure  4  show  snapshots  of  instantaneous  heat 
transfer  rate  on  the  elliptic  leading  edge  surface,  which  is  viewed  from  an  oblique  head-on  angle. 
This  calculation  required  a  mesh  of  191  by  144  by  48  grid  points  over  a  domain  of  size  3. 3D  by 
5.0D  by  0.4D,  where  D  is  the  leading  edge  diameter  of  curvature.  The  free  stream  Mach  number  is 
Moo  =  0.15,  the  FST  intensity  is  wL/f/eo  =  0.06,  and  the  Reynolds  number  is  Rcj^  =  42000.  Details 
of  the  numerical  method  used  in  this  calculation  were  reported  in  Xiong  and  Lele  (2001);  the  code 
executes  in  parallel  and  runs  efficiently  on  the  DURIP  computing  cluster.  Figure  5  shows  the  good 
agreement  obtained  between  the  LES  and  the  experiment  of  Van  Fossen  et.  al.  [1],  performed  at 
the  same  flow  conditions  and  with  the  same  geometry  as  the  computation. 

Computational  results  such  as  those  shown  in  Figure  4  are  being  used  to  enhance  understanding 
of  the  dynamic  processes  important  in  leading  edge  heat  transfer  in  the  presence  of  turbulence.  For 
example,  in  the  figure,  regions  of  enhanced  heat  transfer  appear  as  broad  streaks  on  the  wall. 
These  streaks  are  associated  with  FST  eddies  which  are  stretched  as  they  near  the  leading  edge. 
Observations  such  as  these  have  led  to  a  new  theoretical  analysis  of  the  impact  of  stretched  FST 
eddies  on  the  wall  heat  transfer  in  a  stagnation  point  boundary  layer,  detailed  in  Xiong  and  Lele 
(2002). 
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Figure  4:  Snapshots  of  instantaneous  heat  transfer  rate  on  a  turbine  blade  leading  edge  surface. 


Figure  5:  Comparison  of  the  surface  distribution  of  heat  transfer  (Fr  =  Frossling  number)  for 
laminar  and  turbulent  calculations,  versus  experiment. 
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Computation  of  Mixing  Layer  Receptivity 

Excitation  of  shear  flows  by  sound  is  an  important  element  in  naturally  occuring  aeroacoustic 
feedback  in  flows  such  as  supersonic  screeching  jets.  The  process  by  which  energy  is  transferred 
from  incident  sound  waves  into  hydrodynamic  instabilities  in  a  flow  is  called  receptivity.  Under¬ 
standing  of  the  receptivity  process  also  opens  the  possibility  of  active  control  of  shear  flows  using 
sound.  Existing  receptivity  theories  are  restricted  to  low  temporal  frequencies  and  linear  (small) 
disturbances.  The  objective  of  this  research  is  to  explore  mixing  layer  receptivity  outside  of  these 
parameter  limitations  using  a  computational  approach. 

Figure  6  shows  a  schematic  of  the  present  computational  model  problem.  A  thin  splitter  plate 
separates  two  streams  of  fluid  which  form  a  mixing  layer  downstream  of  the  plate  trailing  edge. 
Incident  acoustic  waves  scatter  at  the  trailing  edge,  creating  instability  waves  which  grow  expo¬ 
nentially  downstream  until  nonlinear  processes  lead  to  vortex  roll-up  in  the  mixing  layer.  Figure  7 
shows  an  example  of  the  instantanous  picture  of  the  computed  acoustic  field  with  superimposed 
contours  of  vorticity.  The  mixing  layer  is  supersonic  (Afj  =  1.2),  and  the  incident  sound  field 
consists  of  nonlinear  plane  acoustic  waves  with  an  associated  disturbance  pressure  which  is  one 
percent  of  the  ambient  pressure.  A  complex  wave  pattern  is  observed  downstream  of  the  splitter 
plate,  where  sound  waves  have  scattered  from  the  vortices  which  have  formed  in  the  mixing  layer. 
Quantitive  data  from  this  type  of  simulation  allowed  us  to  characterize  the  receptivity  of  this  flow 
when  the  sound  is  very  loud  and  the  assumption  of  linearity  may  not  hold.  This  simulation  was 
performed  on  a  grid  with  dimensions  1600  x  500,  and  required  about  90  hours  of  wall  clock  time 
using  20  processors  of  the  DURIP  computing  cluster.  Details  of  the  numerical  method  may  be 
found  in  Barone  and  Lele  (2002a)  and  results  of  further  computations  were  reported  in  Barone  and 
Lele  (2002b). 
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Figure  7:  Calculation  of  the  nonlinear  receptivity  of  a  supersonic  mixing  layer.  The  grey  scale 
contours  are  dilatation;  the  red  contours  are  vorticity. 
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